{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "d27f1082-cd10-405e-9570-6f0e934bba8b",
   "metadata": {},
   "source": [
    "# LlamaParse JSON Mode + Multimodal RAG\n",
    "\n",
    "<a href=\"https://colab.research.google.com/github/run-llama/llama_parse/blob/main/examples/demo_json.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n",
    "\n",
    "This notebook shows you how to use LlamaParse JSON mode with LlamaIndex to build a simple multimodal RAG pipeline.\n",
    "\n",
    "Using JSON mode gives you back a list of json dictionaries, which contains both text and images. You can then download these images and use a multimodal model to extract information and index them."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a004db48-8d3f-421c-915a-477692f71b90",
   "metadata": {},
   "source": [
    "## Setup\n",
    "\n",
    "Define imports, env variables, global LLM/embedding models."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bc6a7a4b-b568-4db5-bcba-62f5c517ff3a",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install llama-index\n",
    "!pip install llama-index-core\n",
    "!pip install llama-index-llms-anthropic llama-index-multi-modal-llms-anthropic\n",
    "!pip install llama-index-embeddings-huggingface\n",
    "!pip install llama-parse"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0879301c-ff91-4431-941a-6c0ef7cd8fe2",
   "metadata": {},
   "outputs": [],
   "source": [
    "# llama-parse is async-first, running the async code in a notebook requires the use of nest_asyncio\n",
    "import nest_asyncio\n",
    "\n",
    "nest_asyncio.apply()\n",
    "\n",
    "import os\n",
    "\n",
    "# API access to llama-cloud\n",
    "os.environ[\"LLAMA_CLOUD_API_KEY\"] = \"llx-\"\n",
    "\n",
    "# Using Anthropic API for embeddings/LLMs\n",
    "os.environ[\"ANTHROPIC_API_KEY\"] = \"sk-\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "391e2d95-5569-4d73-9f16-5b59d7326f8d",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.llms.anthropic import Anthropic\n",
    "\n",
    "llm = Anthropic(model=\"claude-3-opus-20240229\", temperature=0.0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "700f48e8-8b52-41f3-90f9-144d5fdd5c52",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core import Settings\n",
    "\n",
    "Settings.llm = llm\n",
    "Settings.embed_model = \"local:BAAI/bge-small-en-v1.5\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b411d2ee-3e6b-45b0-b532-4a8e3abcdea0",
   "metadata": {},
   "source": [
    "## Load Data\n",
    "\n",
    "Let's load in the Uber 10Q report."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c39d408f-e885-4940-85c7-b09ca3bc7cb7",
   "metadata": {},
   "outputs": [],
   "source": [
    "!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10q/uber_10q_march_2022.pdf' -O './uber_10q_march_2022.pdf'"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c2f42af8-afb3-4b3b-82d3-6b332fb38aa4",
   "metadata": {},
   "source": [
    "## Using LlamaParse in JSON Mode for PDF Reading\n",
    "\n",
    "We show you how to run LlamaParse in JSON mode for PDF reading."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9c9cd670-8229-4ad6-99a9-845bd82b7ec1",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Started parsing the file under job_id cf5a4f51-1af8-47f7-9b3d-80a905d06b89\n"
     ]
    }
   ],
   "source": [
    "from llama_parse import LlamaParse\n",
    "\n",
    "parser = LlamaParse(verbose=True)\n",
    "json_objs = parser.get_json_result(\"./uber_10q_march_2022.pdf\")\n",
    "json_list = json_objs[0][\"pages\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b26d21d1-05b5-4f49-b937-c13106a84015",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core.schema import TextNode\n",
    "from typing import List\n",
    "\n",
    "\n",
    "def get_text_nodes(json_list: List[dict]):\n",
    "    text_nodes = []\n",
    "    for idx, page in enumerate(json_list):\n",
    "        text_node = TextNode(text=page[\"text\"], metadata={\"page\": page[\"page\"]})\n",
    "        text_nodes.append(text_node)\n",
    "    return text_nodes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "364a3276-d2db-4aee-9bc6-617ffd726d25",
   "metadata": {},
   "outputs": [],
   "source": [
    "text_nodes = get_text_nodes(json_list)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2fe2e911-0393-42e8-a233-65639cdbebc4",
   "metadata": {},
   "source": [
    "## Extract/Index images from image dicts\n",
    "\n",
    "Here we use a multimodal model to extract and index images from image dictionaries."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "36012145-5521-4ddb-a53e-df9ebd1ca8dd",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "mkdir: llama2_images: File exists\n"
     ]
    }
   ],
   "source": [
    "# call get_images on parser, convert to ImageDocuments\n",
    "!mkdir llama2_images\n",
    "\n",
    "from llama_index.core.schema import ImageDocument\n",
    "from llama_index.multi_modal_llms.anthropic import AnthropicMultiModal\n",
    "\n",
    "\n",
    "def get_image_text_nodes(json_objs: List[dict]):\n",
    "    \"\"\"Extract out text from images using a multimodal model.\"\"\"\n",
    "    anthropic_mm_llm = AnthropicMultiModal(max_tokens=300)\n",
    "    image_dicts = parser.get_images(json_objs, download_path=\"llama2_images\")\n",
    "    image_documents = []\n",
    "    img_text_nodes = []\n",
    "    for image_dict in image_dicts:\n",
    "        image_doc = ImageDocument(image_path=image_dict[\"path\"])\n",
    "        response = anthropic_mm_llm.complete(\n",
    "            prompt=\"Describe the images as alt text\",\n",
    "            image_documents=[image_doc],\n",
    "        )\n",
    "        text_node = TextNode(text=str(response), metadata={\"path\": image_dict[\"path\"]})\n",
    "        img_text_nodes.append(text_node)\n",
    "    return img_text_nodes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "38f25045-6102-4920-9cd0-42b0ae6c872f",
   "metadata": {},
   "outputs": [],
   "source": [
    "image_text_nodes = get_image_text_nodes(json_objs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4683c97a-da06-408a-9fe9-7e3c0aceb77d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'The image shows a bar graph titled \"Monthly Active Platform Consumers (in millions)\". The graph displays data from Q2 2020 to Q1 2022 over 8 quarters. The number of monthly active platform consumers starts at 55 million in Q2 2020 and steadily increases each quarter, reaching 115 million by Q1 2022. The graph illustrates consistent quarter-over-quarter growth in this metric over the nearly 2 year time period shown.'"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "image_text_nodes[0].get_content()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3cfdf6db-381c-4e53-a0fb-e7670f75e0d5",
   "metadata": {},
   "source": [
    "## Build Index across image and text nodes\n",
    "\n",
    "Here we build a vector index across both text nodes and text nodes extracted from images."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "939aec6c-064a-4319-b2dc-70cc4a304c06",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core import VectorStoreIndex\n",
    "\n",
    "index = VectorStoreIndex(text_nodes + image_text_nodes)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "529340d5-9319-4cdf-8ee1-bbd01ed00226",
   "metadata": {},
   "outputs": [],
   "source": [
    "query_engine = index.as_query_engine()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "81d7ff30-5a87-44da-880d-4b1f41434d90",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The bar graph titled \"Monthly Active Platform Consumers (in millions)\" shows the number of monthly active consumers on Uber's platform over a period of 8 quarters from Q2 2020 to Q1 2022. \n",
      "\n",
      "The graph indicates steady quarter-over-quarter growth in this metric, starting at 55 million monthly active platform consumers in Q2 2020 and increasing each quarter to reach 115 million by Q1 2022. This represents consistent growth in Uber's user base on their platform over the nearly 2 year period shown in the graph.\n"
     ]
    }
   ],
   "source": [
    "# ask question over image!\n",
    "response = query_engine.query(\n",
    "    \"What does the bar graph titled 'Monthly Active Platform Consumers' show?\"\n",
    ")\n",
    "print(str(response))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c4f14ad8-6bfd-49d9-b3d5-7215cf0e4ac1",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Based on the context provided, some of the main risk factors for Uber include:\n",
      "\n",
      "- A significant percentage of Uber's bookings come from large metropolitan areas, which could be negatively impacted by various economic, social, weather, regulatory and other conditions, including COVID-19.\n",
      "\n",
      "- Uber may fail to successfully offer autonomous vehicle technologies on its platform or these technologies may not perform as expected. \n",
      "\n",
      "- Retaining and attracting high-quality personnel is important for Uber's business and continued attrition could adversely impact the company.\n",
      "\n",
      "- Security breaches, data privacy issues, cyberattacks and unauthorized access to Uber's proprietary data and systems pose risks.\n",
      "\n",
      "- Uber is subject to climate change risks, both physical and transitional, that could adversely impact its business if not managed properly. \n",
      "\n",
      "- Uber relies on third parties for open marketplaces to distribute its platform and software, and interference from these third parties could harm its business.\n",
      "\n",
      "- Uber will require additional capital to support its growth and this capital may not be available on reasonable terms.\n",
      "\n",
      "- Acquisitions and integrations carry risks if Uber is unable to successfully identify and integrate suitable businesses.\n",
      "\n",
      "- Extensive government regulations around payments, financial services, data privacy and other areas pose compliance risks and challenges for Uber's business model in certain jurisdictions.\n"
     ]
    }
   ],
   "source": [
    "# ask question over text!\n",
    "response = query_engine.query(\"What are the main risk factors for Uber?\")\n",
    "print(str(response))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "llama-parse-aNC435Vv-py3.10",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
